docs(retrieval): add fine-tuning guide#2306
Open
oliverholworthy wants to merge 26 commits into
Open
Conversation
Signed-off-by: Ronay Ak <ronaya@nvidia.com> Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
jgerh
reviewed
Jun 3, 2026
jgerh
left a comment
Contributor
There was a problem hiding this comment.
Completed tech pubs review of .md files and provided some copyedits and suggested text revisions.
jgerh
reviewed
Jun 3, 2026
jgerh
left a comment
Contributor
There was a problem hiding this comment.
Completed tech pubs review of .md files and provided some copyedits and suggested text revisions.
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
lbliii
added a commit
that referenced
this pull request
Jun 8, 2026
Ports Oliver Holworthy's runnable bi-encoder/cross-encoder retrieval fine-tuning guide (open PR #2306, authored in pre-migration Sphinx format) into the Fern MDX structure as docs/guides/llm/retrieval-finetune.mdx, registered under Recipes with slug retrieval-finetune. This resolves the 5 dangling /recipes-e2e-examples/retrieval-finetune links in the embedding/reranker model-coverage pages (added by #2392), which referenced a guide that was never merged to main. Converted MyST constructs to Fern: :::{warning} -> <Warning>, {download} roles -> GitHub source links, and ../*.md links -> version-agnostic Fern paths. #2306 can be closed as superseded once this lands. Co-Authored-By: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
akoumpa
pushed a commit
that referenced
this pull request
Jun 11, 2026
…2391) * docs(fern): relocate Fern under docs/ and remove legacy Sphinx tree Move the Fern build infrastructure from the top-level fern/ directory into docs/fern/ (config, theme, components, version navs, and the frozen 0.4.0 snapshot), delete the legacy Sphinx scaffolding (conf.py, autodoc parser, project.json, versions1.json), and retire the Sphinx publish workflows in favor of Fern CI. The bleeding-edge docs/ markdown pages are intentionally left as .md here; they are converted to .mdx in the follow-up PR so the conversion lands as a per-file rename+edit diff. docs.yml ships the latest + v0.4 trains (both mount the frozen 0.4.0 pages); the nightly train is enabled in the follow-up PR alongside the converted content. Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(fern): convert nightly docs from Markdown to MDX under docs/ (#2392) Rename each docs/*.md page to docs/*.mdx and apply the Sphinx/MyST -> Fern MDX conversion (YAML frontmatter, version-agnostic links, Fern components), so every page shows as a single rename+edit diff that git log --follow can trace. Enable the nightly version train (nightly.yml + docs.yml versions) mounting the converted top-level docs/ tree. Page content matches origin/main, including #2289's checkpoint-export updates. Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * ci(fern): source frozen doc versions from a docs-archive branch at build time (#2400) Keep only the nightly tree (top-level docs/) on main and move the frozen v0.4 GA snapshot off main onto the long-lived `docs-archive` branch. A new `stitch-fern-versions` composite action restores the archived pages/ trees into the working copy before every Fern build, so the multi-version site still publishes intact. Why a build-time stitch is required: Fern has no native cross-ref sourcing for a version train's prose — `fern generate --docs` reads the single local working tree, and publish is a full-site snapshot (a train missing from the tree is unpublished). `fern check` likewise fatals on any referenced path that is absent. So the frozen pages must be physically present at build time. - Add .github/actions/stitch-fern-versions: ref-agnostic restore via `git fetch --depth=1 origin <ref>` + `git restore --source=FETCH_HEAD`. The registry value is an opaque git ref, so a branch (default) or an immutable docs/v* tag both work; hard-fails if the ref/subtree is missing. - Run the stitch in publish-fern-docs.yml, fern-docs-ci.yml, and fern-docs-preview-build.yml (before fern check / generate / artifact upload). docs.yml and the nav YAMLs always come from the live checkout. - Remove docs/fern/versions/v0.4/pages (143 files) from main; preserved on the docs-archive branch. Gitignore the restore path. - Add `make docs-stitch` (ARCHIVE_REF overridable) and make docs / docs-check / docs-preview depend on it for local dev. - Update the fern-docs SKILL.md and docs/fern/README.md: archived-versions flow, cutting-a-new-train steps, CI diagram, debugging, and references. Scope: Fern fern/ pipeline only. The legacy S3/Akamai Sphinx publish (release-docs.yml, release-nightly-docs.yml) is intentionally untouched. Stacked on #2392 (convert-docs-to-mdx); the migration stack must merge first. Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * fix(docs): drop unrelated source churn from the relocation The relocation branch was committed on a stale base (predating main's save_consolidated migration), so its diff carried 121 unrelated files under nemo_automodel/, tests/, examples/, and tools/ that would have reverted main on merge. Restore them to main's current versions. The docs relocation, MDX conversion, and archive/stitch content are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): drop stray distributed-training skill edit Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): render landing quick links as a Cards grid Replace the inline pipe-separated emoji link list on the docs landing page with a Fern <Cards> grid (body-less cards), matching the page's existing navigation idiom. Labels and hrefs are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): render landing quick links as small Button row Swap the Cards grid for a row of Fern <Button small outlined> links — slimmer, native component, labels and hrefs unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): add Quick Links header above the landing button row Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): pin Recipes section slug to recipes-e2e-examples The "Recipes & E2E Examples" section had no explicit slug, so Fern auto-derived it as "recipes-e-2-e-examples" (the slugifier splits the "E2E" digit). Every internal link — the landing quick-link buttons and the "Recipes & Guides" table — targets /recipes-e2e-examples/..., so the whole section 404'd. Pin the slug across nightly, v0.4, and latest navs so the intended URL resolves on every version train. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): fix broken internal links surfaced by full link audit - nemotron-3-ultra: /launcher/slurm -> /job-launchers/slurm-cluster - vlm model coverage: step-3-7 -> step-3-7-flash (actual page slug) - convert GitHub links to removed Sphinx docs/*.md into internal links: docs/launcher/slurm.md -> /job-launchers/slurm-cluster (8x), docs/guides/llm/dataset.md -> /datasets/text-dataset, docs/guides/dataset-overview.md -> /datasets/overview, docs/guides/vlm/dataset.md -> /datasets/multi-modal-dataset (2x) Audited all 142 internal link targets against the nav route set; only the (not-yet-written) Embedding/Reranking guide remains, tracked separately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): add retrieval fine-tuning guide (port of #2306) Ports Oliver Holworthy's runnable bi-encoder/cross-encoder retrieval fine-tuning guide (open PR #2306, authored in pre-migration Sphinx format) into the Fern MDX structure as docs/guides/llm/retrieval-finetune.mdx, registered under Recipes with slug retrieval-finetune. This resolves the 5 dangling /recipes-e2e-examples/retrieval-finetune links in the embedding/reranker model-coverage pages (added by #2392), which referenced a guide that was never merged to main. Converted MyST constructs to Fern: :::{warning} -> <Warning>, {download} roles -> GitHub source links, and ../*.md links -> version-agnostic Fern paths. #2306 can be closed as superseded once this lands. Co-Authored-By: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(readme): restore save_consolidated: final in checkpoint sketch Same stale-base leak as the distributed-training SKILL.md fix: the relocation commit snapshotted a pre-#2289 copy where save_consolidated was still `true`, diverging from main's `final`. Restore `final` to match main (flagged in review on #2391). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * test: scan docs/model-coverage for .mdx after fern relocation Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
HuiyingLi
pushed a commit
to khazic/Automodel_lao
that referenced
this pull request
Jun 13, 2026
…VIDIA-NeMo#2391) * docs(fern): relocate Fern under docs/ and remove legacy Sphinx tree Move the Fern build infrastructure from the top-level fern/ directory into docs/fern/ (config, theme, components, version navs, and the frozen 0.4.0 snapshot), delete the legacy Sphinx scaffolding (conf.py, autodoc parser, project.json, versions1.json), and retire the Sphinx publish workflows in favor of Fern CI. The bleeding-edge docs/ markdown pages are intentionally left as .md here; they are converted to .mdx in the follow-up PR so the conversion lands as a per-file rename+edit diff. docs.yml ships the latest + v0.4 trains (both mount the frozen 0.4.0 pages); the nightly train is enabled in the follow-up PR alongside the converted content. Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(fern): convert nightly docs from Markdown to MDX under docs/ (NVIDIA-NeMo#2392) Rename each docs/*.md page to docs/*.mdx and apply the Sphinx/MyST -> Fern MDX conversion (YAML frontmatter, version-agnostic links, Fern components), so every page shows as a single rename+edit diff that git log --follow can trace. Enable the nightly version train (nightly.yml + docs.yml versions) mounting the converted top-level docs/ tree. Page content matches origin/main, including NVIDIA-NeMo#2289's checkpoint-export updates. Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * ci(fern): source frozen doc versions from a docs-archive branch at build time (NVIDIA-NeMo#2400) Keep only the nightly tree (top-level docs/) on main and move the frozen v0.4 GA snapshot off main onto the long-lived `docs-archive` branch. A new `stitch-fern-versions` composite action restores the archived pages/ trees into the working copy before every Fern build, so the multi-version site still publishes intact. Why a build-time stitch is required: Fern has no native cross-ref sourcing for a version train's prose — `fern generate --docs` reads the single local working tree, and publish is a full-site snapshot (a train missing from the tree is unpublished). `fern check` likewise fatals on any referenced path that is absent. So the frozen pages must be physically present at build time. - Add .github/actions/stitch-fern-versions: ref-agnostic restore via `git fetch --depth=1 origin <ref>` + `git restore --source=FETCH_HEAD`. The registry value is an opaque git ref, so a branch (default) or an immutable docs/v* tag both work; hard-fails if the ref/subtree is missing. - Run the stitch in publish-fern-docs.yml, fern-docs-ci.yml, and fern-docs-preview-build.yml (before fern check / generate / artifact upload). docs.yml and the nav YAMLs always come from the live checkout. - Remove docs/fern/versions/v0.4/pages (143 files) from main; preserved on the docs-archive branch. Gitignore the restore path. - Add `make docs-stitch` (ARCHIVE_REF overridable) and make docs / docs-check / docs-preview depend on it for local dev. - Update the fern-docs SKILL.md and docs/fern/README.md: archived-versions flow, cutting-a-new-train steps, CI diagram, debugging, and references. Scope: Fern fern/ pipeline only. The legacy S3/Akamai Sphinx publish (release-docs.yml, release-nightly-docs.yml) is intentionally untouched. Stacked on NVIDIA-NeMo#2392 (convert-docs-to-mdx); the migration stack must merge first. Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * fix(docs): drop unrelated source churn from the relocation The relocation branch was committed on a stale base (predating main's save_consolidated migration), so its diff carried 121 unrelated files under nemo_automodel/, tests/, examples/, and tools/ that would have reverted main on merge. Restore them to main's current versions. The docs relocation, MDX conversion, and archive/stitch content are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): drop stray distributed-training skill edit Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): render landing quick links as a Cards grid Replace the inline pipe-separated emoji link list on the docs landing page with a Fern <Cards> grid (body-less cards), matching the page's existing navigation idiom. Labels and hrefs are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): render landing quick links as small Button row Swap the Cards grid for a row of Fern <Button small outlined> links — slimmer, native component, labels and hrefs unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): add Quick Links header above the landing button row Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): pin Recipes section slug to recipes-e2e-examples The "Recipes & E2E Examples" section had no explicit slug, so Fern auto-derived it as "recipes-e-2-e-examples" (the slugifier splits the "E2E" digit). Every internal link — the landing quick-link buttons and the "Recipes & Guides" table — targets /recipes-e2e-examples/..., so the whole section 404'd. Pin the slug across nightly, v0.4, and latest navs so the intended URL resolves on every version train. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): fix broken internal links surfaced by full link audit - nemotron-3-ultra: /launcher/slurm -> /job-launchers/slurm-cluster - vlm model coverage: step-3-7 -> step-3-7-flash (actual page slug) - convert GitHub links to removed Sphinx docs/*.md into internal links: docs/launcher/slurm.md -> /job-launchers/slurm-cluster (8x), docs/guides/llm/dataset.md -> /datasets/text-dataset, docs/guides/dataset-overview.md -> /datasets/overview, docs/guides/vlm/dataset.md -> /datasets/multi-modal-dataset (2x) Audited all 142 internal link targets against the nav route set; only the (not-yet-written) Embedding/Reranking guide remains, tracked separately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): add retrieval fine-tuning guide (port of NVIDIA-NeMo#2306) Ports Oliver Holworthy's runnable bi-encoder/cross-encoder retrieval fine-tuning guide (open PR NVIDIA-NeMo#2306, authored in pre-migration Sphinx format) into the Fern MDX structure as docs/guides/llm/retrieval-finetune.mdx, registered under Recipes with slug retrieval-finetune. This resolves the 5 dangling /recipes-e2e-examples/retrieval-finetune links in the embedding/reranker model-coverage pages (added by NVIDIA-NeMo#2392), which referenced a guide that was never merged to main. Converted MyST constructs to Fern: :::{warning} -> <Warning>, {download} roles -> GitHub source links, and ../*.md links -> version-agnostic Fern paths. NVIDIA-NeMo#2306 can be closed as superseded once this lands. Co-Authored-By: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(readme): restore save_consolidated: final in checkpoint sketch Same stale-base leak as the distributed-training SKILL.md fix: the relocation commit snapshotted a pre-NVIDIA-NeMo#2289 copy where save_consolidated was still `true`, diverging from main's `final`. Restore `final` to match main (flagged in review on NVIDIA-NeMo#2391). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * test: scan docs/model-coverage for .mdx after fern relocation Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Adds a retrieval fine-tuning guide and supporting retrieval utilities/tests for bi-encoder and cross-encoder workflows, including custom data formats, hard-negative mining, export/reload handoff, cache validation, and operational safety checks.
Changelog
Before your PR is "Ready for review"
Pre checks:
Validated locally:
uv run ruff format ...uv run ruff check --fix ...uv run pytest tests/unit_tests/recipes/test_mine_hard_negatives.py tests/unit_tests/datasets/llm/test_materialize_hf_retrieval_subset.py -quv run pytest tests/unit_tests/_transformers/test_retrieval.py tests/unit_tests/recipes/test_mine_hard_negatives.py tests/unit_tests/datasets/llm/test_audit_mined_negatives.py tests/unit_tests/datasets/llm/test_materialize_hf_retrieval_subset.py tests/unit_tests/models/bi_encoder/test_llama_bidirectional_model.py -quv run --group docs sphinx-build -b dummy docs /tmp/automodel-docs-review-final-final -W --keep-goinggit diff --checkAdditional Information
/ok to test <commit-sha>depending on branch trust/signature policy.